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CRYSTALLOGRAPHIC STRUCTURE OF THE ANDROGEN RECEPTOR 

LIGAND BINDING DOMAIN 

Field of Invention 

The present invention relates to compositions and crystals of 
5 androgen receptor ligand binding domain optionally in complex with its 
ligand. This invention also relates to methods of using the structure 
coordinates of the androgen receptor ligand binding domain /ligand 
complex to solve the structure of similar or homologous proteins or 
protein complexes. This invention also relates to methods for designing 

1 0 and selecting ligands that bind to the androgen receptor and methods of 
using such ligands. 
Background of the Invention 

The androgen receptor (AR) is a member of the steroid nuclear- 
receptor superfamily of ligand-dependent transcription factors. The 

1 5 binding of androgen to AR initiates the gene activation required for male 
sex development. 

AR is an important target primarily in two drug discovery areas. 
In oncology drug discovery, inhibitors (antagonists or partial antagonists) 
of androgen receptor function are useful for treatment of anti-androgen 

20 refractory prostate cancer. In metabolic diseases drug discovery, agonists 
or partial agonists to the androgen receptor in muscle are useful to treat 
age-related diseases. 

As with the other members of the steroid receptor family, AR has 
several functional domains including a DNA binding domain (DBD) ? and 

25 a 261 residue ligand-binding domain (LBD) (Mw - 30,245 Da) which 

contains the androgen binding site, and is responsible for switching on 
the androgen function. 

Development of synthetic ligands that specifically bind to 
androgen receptors has been largely guided by trial and error method of 

30 drug design despite the importance of the androgen receptor in 

physiological processes and medical conditions such as prostate cancer 
and modulation of reproductive organ modulation. Previously, new 
ligands specific for androgen receptors were discovered in the absence of 
information on the three dimensional structure of the androgen receptor 

35 with a bound ligand. Before the present invention, researchers were 
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essentially discovering androgen receptor ligands by probing in the dark 
and without the ability to visualize how the amino acids of the androgen 
receptor held a ligand in its grasp. 

Consequently, it would be advantageous to devise methods and 
5 compositions for reducing the time required to discover ligands to the 
androgen receptor, synthesize such compounds and administer such 
compounds to organisms to modulate physiological processes regulated 
by the androgen receptor. 

The cDNA and amino acid sequences of human and rat 
10 androgen receptors have been described (Proc. Natl. 

Acad. Sci. U.S.A. 1988 85: 7211-7215). However, there have been no 
crystals reported of any androgen receptor. Thus, x-ray crystallographic 
analysis of such proteins has not been possible. 

We have discovered the first crystal structure of the androgen 
15 receptor ligand binding domain (AR-LBD). Our understanding or the 
androgen receptor structure has allowed for the determination of the 
ligand binding site for selective androgen receptor modulators (SARMs). 
Summary of the Invention 

The present invention provides crystals of AR-LBD and crystals 
20 of an AR-LBD bound to a ligand, i.e. an AR-LBD/ AR-LBD ligand 

complex. Most preferably the AR-LBD ligand is dihydrotestosterone 
(DHT). Thus, the present invention is directed to a crystal of an AR-LBD 
comprising: 

1) an AR-LBD and an AR-LBD ligand or 
25 2) an AR-LBD without an AR-LBD ligand; 

wherein said crystal diffracts to at least 3 angstrom resolution and has a 
crystal stability within 5% of its unit cell dimensions. The crystal of AR 
or AR-LBD preferably has at least 200 amino acid and preferably 
comprises amino acid sequence 672 to 917 of rat AR or the AR amino 
30 acid sequence 672 to 917 of human AR. 

The present invention also provides the structure coordinates of 
the AR-LBD /AR-LBD ligand complex. The complete coordinates are 
listed in Table A. 

The present invention also provides a method for determining at 
35 least a portion of the three-dimensional structure of molecules or 

molecular complexes which contain at least some structurally similar 
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features to the androgen receptor ligand binding domain. It is preferred 
that these molecules or molecular complexes comprise at least a part of 
the ligand binding site defined by structure coordinates of AR-LBD 
amino acids V685, L700, L701, S702, S703, L704, N705, E706, L707, 
5 G708, E709, Q711, A735, 1737, Q738, Y739, S740, W741, M742, G743, 
L744, M745, V746, F747, A748 ? M749 ? G750, R752, Y763, F764, A765, 
L768, F770, M780, M787, 1869, L873, H874, F876, T877 and F878 
according to Table A, or a mutant or homologue thereof. Since the 
protein sequences for rat and human AR LBD are identical, the human 
10 numbering system was used herein. 

The present invention also provides a machine-readable data 
storage medium which comprises a data storage material encoded with 
machine readable data defined by the structure coordinates of an AR- 
LBD /AR-LBD ligand or ligand complex according to Table A or a 
1 5 homologue of the complex. 

The present invention further provides a binding site in AR-LBD 
for an AR-LBD ligand as well as methods for designing or selecting AR 
modulators including agonists, partial agonists, antagonists, partial 
antagonists and/ or selective androgen receptor modulators (SARMs) of 
20 AR using information about the crystal structures disclosed herein. 
Brief Description of the Drawing 

Figure 1 is a ribbon style drawing of the Androgen Receptor 
LBD. The substrate DHT is shown as a ball-and-stick figure. 

Figure 2 is a comparison of the androgen receptor ligand 
25 binding domain with progesterone receptor ligand binding domain. 

Figure 3 provides three views of the omit electron density map of 
dihydrotestosterone (DHT) in the hormone-binding site of AR-LBD. There 
are hydrogen bonds between the steroid and the side chains of Arg 752 
and Asn 705. 

30 Figure 4 is a comparison of the binding of dihydrotestosterone to 

AR-LBD (top) and of progesterone to PR-LBD (bottom). Note that an 
additional hydrogen bond interaction would be possible if both the 
sidechains of both N719 and the progesterone were flipped. 
Detailed Description of the Invention 

35 The first crystal structure of the androgen receptor ligand 

binding domain (AR-LBD) has been determined to 2.0 A resolution. 
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Crystals of rat AR-LBD were grown from precipitating solutions 
containing 0.9 M Sodium Tartrate, 0.1 M Na Hepes, pH 7,5. X-ray 
diffraction from the crystals have the symmetry and systematic absences 
of the orthorhombic space group P2l2l2l with unit cell dimensions a = 
5 56.03 A , b = 66.27 A, c= 70.38 A, and one molecule per asymmetric 

unit (Mathews Volume = 2.16 A 3 Da 4 ). The structure was determined by 
the method of molecular replacement using the structure of the 
Progesterone Receptor LBD (PR-LBD) as the search model. 



The complex of AR-LBD with dihydro testosterone (DHT) shows 



1 0 the mode of binding of the steroid to the receptor in the agonist 
conformation. 



The following abbreviations are used throughout the application: 



A = Ala = Alanine 

V = Val = Valine 
20 L = Leu = Leucine 

I = He = Isoleucine 

P = Pro = proline 

F = Phe = phenylalanine 

W = Trp = Tryptophan 
25 M = Met = Methionine 

G = Gly = Glycine 

S = Ser = Serine 

T = Thr - Threonine 

C = Cys = Cysteine 
30 Y = Tyr = Tyrosine 

N =Asn = Asparagine 



O Chiral 




H 



Dihydrotestosterone 



15 
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Q 


=Gln = 


Glutamine 


D 


= Asp = 


: Aspartic Acid 


E 


= GIu = 


Glutamic Acid 


K 


- Lys = 


Lysine 


R 


= Arg = 


Arginine 


H 


- His = 


Histidine 



"Atom type" refers to the element whose coordinates have been 
determined. Elements are defined by the first letter in the column. 

10 

"X, Y, Z M crystallographically define the atomic position determined for 
each atom. 

"B" is a thermal factor that measures movement of the atom around its 
15 atomic center. 

"Occ" is an occupancy factor that refers to the fraction of the molecules 
in which each atom occupies the position specified by the coordinates. A 
value of "1" indicates that each atom has the same conformation, i.e., the 
20 same position, in all molecules of the crystal. 

Additional definitions are set forth in the specification where necessary. 

The androgen receptor (AR) described herein is intended to 
include any polypeptide which has the activity of the naturally occurring 

25 androgen receptor . The AR and AR-LBD contemplated herein includes 
all vertebrate and mammalian forms such as rat, mouse, pig, goat, 
horse, guinea pig, rabbit, monkey, orangutan and human. Such terms 
also include polypeptides that differ from naturally occurring forms of 
AR and AR-LBD by having amino acid deletions, substitutions, and 

30 additions, but which retain the activity of AR and AR-LBD, respectively. 
The crystal structure of the invention preferably contains at least 25%, 
more preferably at least 50%, more preferably at least 75%, more 
preferably at least 90%, more preferably at least 95%, more preferably at 
least 99%, and most preferably all of the coordinates listed in Table A. 

35 The crystal of the AR-LBD/ AR-LBD ligand of the invention preferably 

has the following unit cell dimensions in angstroms: a = 56.03 ± 5% , b 
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= 66.27 ± 5%, c = 70.38 ± 5% and an orthorhombic space group 
P212121. 

The AR-LBD ligand of this invention is any peptide, peptide 
mimetic or nonpeptide, including small organic molecules, that is 
5 capable of acting as a ligand for AR-LBD. In a preferred embodiment, 
the AR-LBD ligand is an AR modulator. By "AR modulator" it is meant 
an agonist or activator, a partial agonist or partial activator, an 
antagonist or inhibitor, or a partial antagonist or partial inhibitor which 
demonstrates tissue specific activations of the AR. Such compounds are 
1 0 also referred to herein as SARMs (selective androgen receptor 

modulators) of the AR-LBD. Examples of preferred agonists include 
androgens such as dihydrotestosterone. 

The peptides referred to herein (e.g., AR, AR-LBD, and the like) 
may be produced by any well-known method, including synthetic 
15 methods, such as solid phase, liquid phase and combination solid 

phase/ liquid phase syntheses; recombinant DNA methods, including 
cDNA cloning, optionally combined with site directed mutagenesis; 
and / or purification of the natural products, optionally combined with 
enzymatic cleavage methods to produce fragments of naturally occurring 
20 Advantageously, the crystallizable compositions provided by this 

invention are amenable to x-ray crystallography. Thus, this invention 
also provides the three-dimensional structure of the AR-LBD/AR-LBD 
ligand complex, particularly the complex of rat AR-LBD with 
dihydrotesto sterone . 
25 The three-dimensional structure of the AR-LBD / 

dihydro testosterone complex of this invention is defined by a set of 
structure coordinates as set forth in Table A. The term "structure 
coordinates" refers to Cartesian coordinates derived from mathematical 
equations related to the patterns obtained on diffraction of a 
30 monochromatic beam of X-rays by the atoms (scattering centers) of an 
androgen receptor/ dihydro testosterone complex in crystal form. The 
diffraction data are used to calculate an electron density map of the 
repeating unit of the crystal. The electron density maps are then used to 
establish the positions of the individual atoms of the complex. 
35 Those of skill in the art will understand that a set of structure 

coordinates for a receptor or receptor/ ligand complex or a portion 
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thereof, is a relative set of points that define a shape in three 
dimensions. Thus, it is possible that an entirely different set of 
coordinates could define a similar or identical shape. Moreover, slight 
variations in the individual coordinates will have little effect on overall 
5 shape. 

The variations in coordinates discussed above may be generated 
because of mathematical manipulations of the structure coordinates. For 
example, the structure coordinates set forth in Table A could be 
manipulated by crystallographic permutations of the structure 

10 coordinates, fractionalization of the structure coordinates; integer 

additions or subtractions to sets of the structure coordinates, inversion 
of the structure coordinates or any combination of the above. 

Alternatively, modifications in the crystal structure due to 
mutations, additions, substitutions, and/or deletions of amino acids, or 

15 other changes in any of the components that make up the crystal could 
also account for variations in structure coordinates. If such variations 
are within an acceptable standard error as compared to the original 
coordinates, the resulting three-dimensional shape is considered to be 
the same. 

20 Various computational analyses are therefore necessary to 

determine whether a molecule or molecular complex or a portion thereof 
is sufficiently similar to all or parts of the androgen 
receptor/ dihydro testosterone described above as to be considered the 
same. Such analyses may be carried out in current software 

25 applications, such as the Molecular Similarity application of QUANTA 
(Molecular Simulations Inc., San Diego, CA) version 4.1, and as 
described in the accompanying User's Guide. 

The Molecular Similarity application permits comparisons 
between different structures, different conformations of the same 

30 structure, and different parts of the same structure. The procedure used 
in Molecular Similarity to compare structures is divided into four steps: 
1) load the structures to be compared; 2) define the atom equivalences in 
these structures; 3) perform a fitting operation; and 4) analyze the 
results. 

35 Each structure is identified by a name. One structure is 

identified as the target (i.e., the fixed structure); all remaining structures 
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are working structures (i.e., moving structures). Since atom equivalency 
within QUANTA is defined by user input, for the purpose of this 
invention we will define equivalent atoms as protein backbone atoms (N, 
Cs, C and O) for all conserved residues between the two structures being 
5 compared. We will also consider only rigid fitting operations. 

When a rigid fitting method is used, the working structure is 
translated and rotated to obtain an optimum fit with the target 
structure. The fitting operation uses an algorithm that computes the 
optimum translation and rotation to be applied to the moving structure, 
1 0 such that the root mean square difference of the fit over the specified 

pairs of equivalent atom is an absolute minimum. This number, given in 
angstroms, is reported by QUANTA. 

For the purpose of this invention, any molecule or molecular 
complex that has a root mean square deviation of conserved residue 
15 backbone atoms (N, Ca, C, O) of less than 1.5 A when superimposed on 
the relevant backbone atoms described by structure coordinates listed in 
Table A are considered identical. More preferably, the root mean square 
deviation is less than 1.0 A. In a preferred embodiment of the present 
invention, the molecule or molecular complex comprises at least a 
20 portion of the ligand binding site defined by structure coordinates of AR- 
LBD amino acids V685, L700, L701, S702, S703, L704, N705, E706, 
L707, G708, E709, Q711, A735, 1737, Q738, Y739, S740, W741, M742, 
G743, L744, M745, V746, F747, A748, M749, G750, R752, Y763, F764, 
A765, L768, F770, M780, M787, 1869, L873, H874, F876, T877 and 
25 F878 according to Table A, or a mutant or homologue of said molecule or 
molecular complex. More preferred are molecules or molecular 
complexes comprising all or any part of the ligand binding site defined by 
structure coordinates of AR-LBD amino acids N705, Q71 1, R752, F764 
and T877 according to Table A, or a mutant or homologue of said 
30 molecule or molecular complex. Since the protein sequences for rat and 
human AR LBD are identical, the human numbering system has been 
used herein. 

The term "complex" or "molecular complex" means AR-LBD or a 
mutant or homologue of AR-LBD in a covalent or non-covalent 
35 association with a chemical entity or compound. 
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For purposes of the present invention, by "at least a portion of 
it is meant all or any part of the ligand binding site defined by these 
structure coordinates. 

By "mutant or homologue" as used herein it is meant a molecule 
5 or molecular complex having a similar structure and/ or sequences to 
AR-LBD, By "similar structure" it is meant a mutant or homologue 
having a binding pocket that has a root mean square deviation from the 
backbone atoms of said AR-LBD amino acids of not more than 1.5 
Angstroms. By "similar sequence" it is meant a mutant or homologue 
10 having 30%, or more preferably 75%, identity with AR-LBD. 

The term "root mean square deviation" means the square root of 
the arithmetic mean of the squares of the deviations from the mean. It is 
a way to express the deviation or variation from a trend or object. For 
purposes of this invention, the "root mean square deviation" defines the 
1 5 variation in the backbone of a protein or protein complex from the 
relevant portion of the backbone of the AR portion of the complex as 
defined by the structure coordinates described herein. 

Once the structure coordinates of a protein crystal have been 
determined they are useful in solving the structures of other crystals. 
20 Thus, in accordance with the present invention, the structure 

coordinates of an androgen receptor/ dihydrotestosterone complex, and 
in particular a complex, and portions thereof is stored in a machine- 
readable storage medium. Such data may be used for a variety of 
purposes, such as drug discovery and x-ray crystallographic analysis or 
25 protein crystal. 

Accordingly, in one embodiment of this invention is provided a 
machine-readable data storage medium comprising a data storage 
material encoded with the structure coordinates set forth in Table A. 
One embodiment utilizes System 1 0 as disclosed in WO 
30 98/ 1 1 134, the disclosure of which is incorporated herein by reference in 
its entirety 

For the first time, the present invention permits the use of 
structure-based or rational drug design techniques to design, select, and 
synthesize chemical entities, including inhibitory and stimulatory 
35 compounds that are capable of binding to AR-LBD, or any portion 
thereof. 
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One particularly useful drug design technique enabled by this 
invention is iterative drug design. Iterative drug design is a method for 
optimizing associations between a protein and a compound by 
determining and evaluating the three-dimensional structures of 
5 successive sets of protein/ compound complexes. 

Those of skill in the art will realize that association of natural 
ligands or substrates with the binding pockets of their corresponding 
receptors or enzymes is the basis of many biological mechanisms of 
action. The term "binding pocket" as used herein, refers to a region of a 

10 molecule or molecular complex, that, as a result of its shape, favorably 
associates with another chemical entity or compound. Similarly, many 
drugs exert their biological effects through association with the binding 
pockets of receptors and enzymes. Such associations may occur with all 
or any parts of the binding pockets. An understanding of such 

1 5 associations will help lead to the design of drugs having more favorable 
associations with their target receptor or enzyme, and thus, improved 
biological effects. Therefore, this information is valuable in designing 
potential ligands or inhibitors of receptors or enzymes, such as inhibitors 
of AR. 

20 The term "associating with" refers to a condition of proximity 

between chemical entities or compounds, or portions thereof. The 
association may be non-covalent — wherein the juxtaposition is 
energetically favored by hydrogen bonding or van der Waals or 
electrostatic interactions — or it may be covalent. 

25 In iterative drug design, crystals of a series of 

protein/ compound complexes are obtained and then the three- 
dimensional structures of each complex is solved. Such an approach 
provides insight into the association between the proteins and 
compounds of each complex. This is accomplished by selecting 

30 compounds with inhibitory activity, obtaining crystals of this new 

protein /compound complex, solving the three dimensional structure of 
the complex, and comparing the associations between the new 
protein/compound complex and previously solved protein/ compound 
complexes. By observing how changes in the compound affected the 

35 protein/ compound associations, these associations may be optimized. 
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In some cases, iterative drug design is carried out by forming 
successive protein-compound complexes and then crystallizing each new 
complex. Alternatively, a pre-formed protein crystal is soaked in the 
presence of an inhibitor, thereby forming a protein/ compound complex 
5 and obviating the need to crystallize each individual protein /compound 
complex. 

As used herein, the term "soaked" refers to a process in which 
the crystal is transferred to a solution containing the compound of 
interest. 

1 0 The structure coordinates set forth in Table A can also be used 

to aid in obtaining structural information about another crystallized 
molecule or molecular complex. This may be achieved by any of a 
number of well-known techniques, including molecular replacement. 

The structure coordinates set forth in Table A can also be used 

15 for determining at least a portion of the three-dimensional structure of 
molecules or molecular complexes which contain at least some 
structurally similar features to AR. In particular, structural information 
about another crystallized molecule or molecular complex may be 
obtained. This may be achieved by any of a number of well-known 

20 techniques, including molecular replacement. 

Therefore, in another embodiment this invention provides a 
method of utilizing molecular replacement to obtain structural 
information about a crystallized molecule or molecular complex whose 
structure is unknown comprising the steps of: 

25 a) generating an X-ray diffraction pattern from said crystallized molecule 
or molecular complex; 

b) applying at least a portion of the structure coordinates set forth in 
Table A to the X-ray diffraction pattern to generate a three-dimensional 
electron density map of the molecule or molecular complex whose 
30 structure is unknown; and 

c) using all or a portion of the structure coordinates set forth in Table A 
to generate homology models of AR-LBD or any other nuclear hormone 
receptor ligand binding domain. 

Preferably, the crystallized molecule or molecular complex is 
35 obtained by soaking a crystal of this invention in a solution. 
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By using molecular replacement, all or part of the structure 
coordinates of the AR-LBD / AR-LBD ligand complex provided by this 
invention or molecular complex whose structure is unknown more 
quickly and efficiently than attempting to determine such information ab 
5 initio. 

Molecular replacement provides an accurate estimation of the 
phases for an unknown structure. Phases are a factor in equations used 
to solve crystal structures that can not be determined directly. Obtaining 
accurate values for the phases, by methods other than molecular 

10 replacement, is a time-consuming process that involves iterative cycles of 
approximations and refinements and greatly hinders the solution of 
crystal structures. However, when the crystal structure of a protein 
containing at least a homologous portion has been solved, the phases 
from the known structure provide a satisfactory estimate of the phases 

15 for the unknown structure. 

Thus, this method involves generating a preliminary model of a 
molecule or molecular complex whose structure coordinates are 
unknown, by orienting and positioning the relevant portion of the AR- 
LBD /AR-LBD ligand complex according to Table A within the unit cell of 

20 the crystal of the unknown molecule or molecular complex so as best to 
account for the observed X-ray diffraction pattern of the crystal of the 
molecule or molecular complex whose structure is unknown. Phases can 
then be calculated from this model and combined with the observed X- 
ray diffraction pattern amplitudes to generate an electron density map of 

25 the structure whose coordinates are unknown. This, in turn, can be 
subjected to any well-known model building and structure refinement 
techniques to provide a final, accurate structure of the unknown 
crystallized molecule or molecular complex [E. Lattman, "Use of the 
Rotation and Translation Functions", in Meth. Enzymol., 115, pp. 55-77 

30 (1985); M. G. Rossmann, ed., "The Molecular Replacement Method", Int. 
Sci. Rev. Set., No. 13, Gordon & Breach, New York (1972)]. 

The structure of any portion of any crystallized molecule or 
molecular complex, or mutant, homologue or orphan receptor that is 
sufficiently homologous to any portion of the AR-LBD/ AR-LBD ligand 

35 complex can be solved by this method. Along with the aforementioned 
AR, there also exist a number of AR for which the activating or 
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deactivating ligands may not be characterized. These proteins are 
classified as AR due to strong sequence homology to other AR, and are 
known as orphan receptors. 

The structure coordinates are also particularly useful to solve 
5 the structure of crystals of AR-LBD/AR-LBD ligand co-complexed with a 
variety of chemical entities. This approach enables the determination of 
the optimal sites for interaction between chemical entities, including 
interaction of candidate AR inhibitors with the complex. For example, 
high resolution X-ray diffraction data collected from crystals exposed to 
1 0 different types of solvent allows the determination of where each type of 
solvent molecule resides. Small molecules that bind tightly to these sites 
can then be designed and synthesized and tested for their AR inhibition 
activity. 

All of the complexes referred to above may be studied using well- 

15 known X-ray diffraction techniques and may be refined versus 1.5-3 A 
resolution X-ray data to an R value of about 0.20 or less using computer 
software, such as X-PLOR [Yale University, 1992, distributed by 
Molecular Simulations, Inc.; see, e.g., Blundell & Johnson, supra; Meth. 
EnzymoL, vol. 114 & 115, H. W. Wyckoff et al., eds., Academic Press 

20 (1985)]. This information may thus be used to optimize known AR 

agonists, partial agonists, antagonists, partial antagonists and SARMS, 
and more importantly, to design new AR agonists /antagonists. 

Accordingly, the present invention is also directed to a binding 
site in AR-LBD for an AR-LBD ligand in which a portion of AR-LBD 

25 ligand is in van der Walls contact or hydrogen bonding contact with at 

least one of the following residues: V685, L700, L701, S702, S703, L704, 
N705, E706, L707, G708, E709, Q711, A735, 1737, Q738, Y739, S740, 
W741, M742, G743, L744, M745, V746, F747, A748, M749, G750, R752, 
Y763, F764, A765, L768, F770, M780, M787, 1869, L873, H874, F876, 

30 T877, F878, L880, L881, V889, F891, P892, E893, M894, M895, A896, 
E897, 1898, 1899, S900, V901, Q902, V903, P904 or 1906 of AR-LBD. 
For purposes of this invention, by AR-LBD binding site it is also meant to 
include mutants or homologues thereof. In a preferred embodiment, the 
mutants or homologues have at least 25% identity, more preferably 50% 

35 identity, more preferably 75% identity, and most preferably 95% identity 
to residues V685, L700, L701, S702, S703, L704, N705, E706, L707, 
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G708, E709, Q711, A735, 1737, Q738, Y739, S740, W741, M742, G743, 
L744, M745, V746, F747, A748, M749, G750, R752, Y763, F764, A765, 
L768, F770, M780, M787, 1869, L873, H874, F876, T877, F878, L880, 
L881, V889, F891, P892, E893, M894, M895, A896, E897, 1898, 1899, 
5 S900, V901, Q902, V903, P904 or 1906 of AR-LBD binding sites. 

The present invention is also directed to a machine-readable 
data storage medium, comprising a data storage material encoded with 
machine readable data, wherein the data is defined by the structure 
coordinates of an AR-LBD/ AR-LBD ligand according to Table A or a 
10 homologue of said complex, wherein said homologue comprises 
backbone atoms that have a root mean square deviation from the 
backbone atoms of the complex of not more than 3.QA. Preferably, the 
machine-readable data storage medium, according to the invention, is 
wherein said molecule or molecular complex is defined by the set of 
15 structure coordinates for AR-LBD /AR-LBD ligand according to Table A, 
or a homologue of said molecule or molecular complex, said homologue 
having a root mean square deviation from the backbone atoms of said 
amino acids of not more than 2.0 A. In a preferred embodiment the 
machine-readable data storage medium comprises a data storage 
20 material encoded with a first set of machine readable data comprising a 
Fourier transform of at least a portion of the structural coordinates for 
an AR-LBD/ AR-LBD ligand according to Table A; which, when combined 
with a second set of machine readable data comprising an X-ray 
diffraction pattern of a molecule or molecular complex of unknown 
25 structure, using a machine programmed with instructions for using said 
first set of data and said second set of data, can determine at least a 
portion of the structure coordinates corresponding to the second set of 
machine readable data, said first set of data and said second set of data. 
The present invention also provides for computational methods 
30 using three dimensional models of the androgen receptor that are based 
on crystals of AR-LBD /AR-LBD ligand complex. Generally, the 
computational method of designing an androgen receptor ligand 
determines which amino acid or amino acids of the AR-LBD interact with 
a chemical moiety (at least one) of the ligand using a three dimensional 
35 model of a crystallized protein comprising the AR-LBD with a bound 
ligand, and selecting a chemical modification (at least one) of the 
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chemical moiety to produce a second chemical moiety with a structure 
that either decreases or increases an interaction between the interacting 
amino acid and the second chemical moiety compared to the interaction 
between the interacting amino acid and the corresponding chemical 
5 moiety on the natural hormone. 

The computational methods of the present invention are for 
designing androgen receptor synthetic ligands using such crystal and 
three dimensional structural information to generate synthetic ligands 
that modulate the conformational changes of the androgen receptor's 

10 LBD. These computational methods are particularly useful in designing 
an agonist, partial agonist, antagonist or partial antagonist or SARMs to 
the androgen receptor, wherein the agonist, partial agonist, antagonist or 
partial antagonist or SARMS has an extended moiety that prevents any 
one of a number of ligand-induced molecular events that alter the 

15 receptor's influence on the regulation of gene expression, such as 

preventing the normal coordination of the activation domain observed for 
a naturally occurring ligand or other ligands that mimic the naturally 
occurring ligand, such as an agonist. As described herein, synthetic 
ligands of the androgen receptor will be useful in modulating androgen 

20 receptor activity in a variety of medical conditions. 

AR is known to comprise various domains as follows: 

1) a variable amino- terminal domain; 

2) a highly conserved DNA-binding domain (DBD); and 

3) a less conserved carboxyl-terminal ligand-binding domain (LBD). 

25 This modularity permits different domains of each protein to separately 
accomplish different functions, although the domains can influence each 
other. The separate function of a domain is usually preserved when a 
particular domain is isolated from the remainder of the protein. Using 
conventional protein chemistiy techniques a modular domain can 

30 sometimes be separated from the parent protein. Using conventional 
molecular biology techniques each domain can usually be separately 
expressed with its original function intact or chimerles of two different 
nuclear receptors can be constructed, wherein the chimetics retain the 
properties of the individual functional domains of the respective nuclear 

35 receptors from which the chimerica were generated. 
Amino Terminal Domain 
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The amino terminal domain is the least conserved of the three 
domains. This domain is involved in transcriptional activation and in 
some cases its uniqueness may dictate selective receptor-DNA binding 
and activation of target genes by specific receptor isoforms. This domain 
5 can display synergistic and antagonistic interactions with the domains of 
the LBD. For example, studies with mutated and/or deleted receptors 
show positive cooperativity of the amino and carboxy terminal domains. 
In some cases, deletion of either of these domains will abolish the 
receptor's transcriptional activation functions. 

1 0 DNA-Binding Domain 

The DBD is the most conserved domain. The DBD contains two 
perpendicularly oriented a-helixes that extend from the base of the first 
and second zinc fingers. The two zinc fingers function in concert along 
with non-zinc finger residues to direct nuclear receptors to specific target 

15 sites on DNA and to align receptor homodimer or heterodimer interfaces. 
Various amino acids in DBD influence spacing between two half-sites for 
receptor dimer binding. 
Ligand orAR Binding Domain 

The LBD is the second most highly conserved domain. Whereas 

20 integrity of several different LBD sub-domains is important for ligand 
binding, truncated molecules containing only the LBD retain normal 
ligand- binding activity. This domain also participates in other functions, 
including dimerization, nuclear translocation and transcriptional 
activation. Importantly, this domain is the binding site for ligands, i.e. 

25 AR modulators, and undergoes ligand-induced conformational changes 
as detailed herein. 

As described herein, the LBD of AR can be expressed, 
crystallized, its three dimensional structure determined with a ligand 
bound (either using crystal data from the same receptor or a different 

30 receptor or a combination thereof), and computational methods used to 
design ligands to its LBD, particularly ligands that contain an extension 
moiety that coordinates the activation domain of AR. 

Once a computationally designed ligand (CDL) is synthesized, it 
can be tested using assays to establish its activity as an agonist, partial 

35 agonist, antagonist or partial antagonist or SARM, and affinity, as 

described herein. After such testing, the CDLs can be further refined by 
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generating LBD crystals with a CDL bound to the LBD. The structure of 
the CDL can then be further refined using the chemical modification 
methods described herein for three dimensional models to improve the 
activity or affinity of the CDL and make second generation CDLs with 
5 improved properties, such as that of a super agonist or antagonist. 

Typically AR-LBD is purified to homogeneity for crystallization. 
Purity of AR-LBD is measured with SDS-PAGE, mass spectrometry and 
hydrophobic HPLC. The purified AR for crystallization should be at least 
97.5 % pure or 97.5%, preferably at least 99.0% pure or 99.0% pure, 

10 more preferably at least 99.5% pure or 99.5% pure. 

Initially purification of the unliganded receptor can be obtained 
by conventional techniques, such as hydrophobic interaction 
chromatography (HPLC), ion exchange chromatography (HPLC), and 
heparin affinity chromatography. 

15 To achieve higher purification for improved crystals of AR, it will 

be desirable to ligand shift purify the nuclear receptor using a column 
that separates the receptor according to charge, such as an ion exchange 
or hydrophobic interaction column, and then bind the eluted receptor 
with a ligand, especially an agonist or partial agonist. The ligand induces 

20 a change in the receptor's surface charge such that when re- 

chromatographed on the same column, the receptor then elutes at the 
position of the liganded receptor are removed by the original column run 
with the unliganded receptor. Usually saturating concentrations of 
ligand are used in the column and the protein can be preincubated with 

25 the ligand prior to passing it over the column. 

More recently developed methods involve engineering a "tag" 
such as with histidine placed on the end of the protein, such as on the 
amino terminus, and then using a nickle chelation column for 
purification, Janknecht R., Proc. Natl, Acad.Sci. USA Vol 88:8972-8976 

30 (1991) incorporated by reference . 

To determine the three dimensional structure of a AR-LBD, it is 
desirable to co-crystalize the LBD with a corresponding LBD ligand. 

Typically purified AR-LBD is equilibrated at a saturating 
concentration of ligand at a temperature that preserves the integrity of 

35 the protein. Ligand equilibration can be established between 2 and 37° 
C, although the receptor tends to be more stable in the 2-20° C range. 
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Preferably crystals are made with the hanging drop methods. 
Regulated temperature control is desirable to improve crystal stability 
and quality. Temperatures between 4 and 25°C are generally used and it 
is often preferable to test crystallization over a range of temperatures. It 
5 is preferable to use crystallization temperatures from 18 to 25°C, more 
preferably 20 to 23°C, and most preferably 22°C. 

Ligands that interact with AR can act as an agonist, partial 
agonist, antagonist or partial antagonist or SARM based on what ligand- 
induced conformational changes take place. 

1 0 Agonists or partial agonists induce changes in receptors that 

place them in an active conformation that allows them to influence 
transcription, either positively or negatively. There may be several 
different ligand-induced changes in the receptor's conformation. 

Antagonists or partial antagonists bind to receptors, but fail to 

15 induce conformational changes that alter the receptor's transcriptional 
regulatory properties or physiologically telcram conformations. Binding 
of an antagonist or partial antagonist can also block the binding and 
therefore the actions of an agonist or partial agonist. 

Partial agonists, or partial antagonists, bind to receptors and 

20 induce only part of the changes in the receptors that are induced by 

agonists or antagonists, respectively. The differences can be qualitative 
or quantitative. Thus, a partial agonist or partial antagonist may induce 
some of the conformation changes induced by agonists or antagonists, 
respectively, but not others, or it may only induce certain changes to a 

25 limited extent. 

As described herein, the unliganded receptor is in a 
configuration that is either inactive, has some activity or has repressor 
activity. Binding of agonist ligands induces conformational changes in 
the receptor such that the receptor becomes more active, either to 

30 stimulate or repress the expression of genes. The receptors may also 

have non-genomic actions, some of the known types of changes and/or 
the sequelae of these are listed herein. 

Heat shock protein binding domains present a region for binding 
to the LBD and can be modulated by the binding of a ligand to the LBD. 

35 Consequently, an extended chemical moiety (or more) from the ligand 
that stabilizes the binding or comact of the heat shock protein binding 
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domain with the LBD can be designed. Typically such chemical 
moieties will extend past and away from the molecular recognition 
domain on the ligand and usually past the buried binding cavity of the 
ligand. 

Ligand binding by the receptor is a dynamic process, which 
regulates receptor function by inducing an altered conformation. 

The three-dimensional structure of the liganded AR receptor will 
greatly aid in the development of new AR synthetic ligands. In addition, 
AR is overall well suited to modern methods including three-dimensional 
structure elucidation and combinatorial chemistry such as those 
disclosed in EP 335 628, U.S. patent 5,463,564, which are incorporated 
herein by reference. Computer programs that use crystallography data 
when practicing the present invention will enable the rational design of 
ligand to AR. Programs such as RASMOL can be used with the atomic 
coordinates from crystals generated by practicing the invention or used 
to practice the invention by generating three dimensional models and/ or 
determining the structures involved in ligand binding. Computer 
programs such as INSIGHT and GRASP allow for further manipulation 
and the ability to introduce new structures. In addition, high throughput 
binding and bioactivity assays can be devised using purified recombinant 
protein and modern reporter gene transcription assays described herein 
and known in the art in order to refine the activity of a CDL. 

Generally the computational method of designing an AR 
synthetic ligand comprises two steps: 

1) determining which amino acid or amino acids of AR- LBD interacts 
with a first chemical moiety (at least one) of the ligand using a three 
dimensional model of a crystallized protein comprising an AR-LBD with a 
bound ligand; and 

2) selecting a chemical modifications (at least one) of the first chemical 
moiety to produce a second chemical moiety with a structure to either 
decrease or increase an interaction between the interacting amino acid 
and the second chemical moiety compared to the interaction between the 
interacting amino acid and the first chemical moiety. 

Preferably the method is carried out wherein said three dimensional 
model is generated by comparing isomorphous ligand derivatives to 
produce improved phasing. Further preferred is wherein said method 
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comprises determining a change in interaction between said interacting 
amino acid and said ligand after chemical modification of said first 
chemical moiety, especially wherein said three dimensional model is 
generated by comparing isomorphous ligand derivatives to produce 
5 improved phasing. Also preferred is wherein said selecting uses said 
first chemical moiety that interacts with at least one of the interacting 
amino acids V685, L700, L701, S702, S703, L704, N705, E706, L707, 
G708, E709, Q711, A735, 1737, Q738, Y739, S740, W741, M742, G743, 
L744, M745, V746, F747, A748, M749, G750, R752, Y763 ? F764, A765, 

10 L768, F770, M780, M787, 1869, L873, H874, F876, T877, F878, L880, 
L881, V889, F891, P892, E893, M894, M895 ? A896 ? E897 ? 1898, 1899, 
S900, V901, Q902, V903, P904 or 1906. 

As shown herein, interacting amino acids form contacts with the 
ligand and the center of the atoms of the interacting amino acids are 

1 5 usually 2 to 4 angstroms away from the center of the atoms of the 
ligand. Generally these distances are determined by computer as 
discussed herein and in McRee 1993, however distances can be 
determined manually once the three dimensional model is made. See 
also Wagner et al., Nature 378(6558) :670-697 (1995) for stereochemical 

20 figures of -three dimensional models. More commonly, the atoms of the 
ligand and the atoms of interacting amino acids are 3 to 4 angstroms 
apart. The invention can be practiced by repeating steps I and 2 to refine 
the fit of the ligand to the LBD and to determine a better ligand, such as 
an agonist, partial agonist, antagonist or partial antagonist or SARM. 

25 The three dimensional model of AR can be represented in two 

dimensions to determine which amino acids contact the ligand and to 
select a position on the ligand for chemical modification and changing 
the interaction with a particular amino acid compared to that before 
chemical modification. The chemical modification may be made using a 

30 computer, manually using a two dimensional representation of the three 
dimensional model or by chemically synthesizing the ligand. The ligand 
can also interact with distant amino acids after chemical modification of 
the ligand to create a new ligand. Distant amino acids are generally not 
in contact with the ligand before chemical modification. A chemical 

35 modification can change the structure of the ligand to make as new 
ligand that interacts with a distant amino acid usually at least 4.5 
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angstroms away from the ligand, preferably wherein said first chemical 
moiety is 6 to 12 angstroms away from a distant amino acid. Often 
distant amino acids will not line the surface of the binding cavity for the 
ligand, they are too far away from the ligand to be part of a pocket or 
5 binding cavity. The interaction between a LBD amino acid and an atom 
of an LBD ligand can be made by any force or attraction described in 
nature. Usually the interaction between the atom of the amino acid and 
the ligand will be the result of a hydrogen bonding interaction, charge 
interaction, hydrophobic interaction, van der Waals interaction or dipole 

10 interaction. In the case of the hydrophobic interaction it is recognized 
that this is not a per se interaction between the amino acid and ligand, 
but rather the usual result, in part, of the repulsion of water or other 
hydrophilic group from a hydrophobic surface. Reducing or enhancing 
the interaction of the LBD and a ligand can be measured by calculating 

15 or testing binding energies, computationally or using thermodynamic or 
kinetic methods as known in the art. 

Chemical modifications will often enhance or reduce interactions 
of an atom of a LBD amino acid and an atom of an LBD ligand. Steric 
hindrance will be a common means of changing the interaction of the 

20 LBD binding cavity with the activation domain. 

The present invention also provides methods for identifying 
compounds that modulate androgen receptor activity. Various methods 
or combinations thereof can be used to identify these compounds. For 
example, test compounds can be modeled that fit spatially into the AR- 

25 LBD as defined by structure coordinates according to Table A, or using a 
three-dimensional structural model of AR-LBD, mutant AR-LBD or AR- 
LBD homolog or portion thereof. Structure coordinates of the ligand 
binding site, in particular amino acids V685, L700, L701, S702, S703, 
L704, N705, E706, L707, G708, E709, Q711, A735, 1737, Q738, Y739, 

30 S740, W741, M742, G743, L744, M745, V746, F747, A748, M749, G750, 
R752, Y763, F764, A765, L768, F770, M780, M787, 1869, L873, H874, 
F876, T877, F878, L880, L881, V889, F891, P892, E893, M894, M895, 
A896, E897, 1898, 1899, S900, V901, Q902, V903, P904 or 1906 can also 
be used to identify structural and chemical features. Identified 

35 structural or chemical features can then be employed to design or select 
compounds as potential AR modulators. By structural and chemical 
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features it is meant to include, but is not limited to, van der Waals 
interactions, hydrogen bonding interactions, charge interaction, 
hydrophobic bonding interaction, hydrophobic interaction and dipole 
interaction. Alternatively, or in conjunction, the three-dimensional 
5 structural model or the ligand binding site can be employed to design or 
select compounds as potential AR modulators. Compounds identified as 
potential AR modulators can then be synthesized and screened in an 
assay characterized by binding of a test compound to the AR-LBD. 
Examples of assays useful in screening of potential AR modulators 
10 include, but are not limited to, screening in silico, in vitro assays and 
high throughput assays. Finally, these methods may also involve 
modifying or replacing one or more amino acids from AR-LBD such as 
V685, L700, L701, S702, S703, L704, N705, E706, L707, G708, E709, 
Q711, A735, 1737, Q738, Y739, S740, W741, M742, G743, L744, M745, 
15 V746, F747, A748, M749, G750, R752, Y763, F764, A765, L768, F770, 
M780, M787, 1869, L873, H874, F876, T877, F878, L880, L881, V889, 
F891, P892, E893, M894, M895, A896, E897, 1898, 1899, S900, V901, 
Q902, V903, P904 or 1906 of AR-LBD according to Table A. 

A preferred method of the invention can be described as a 
20 computational method of designing an androgen receptor antagonist 
from an androgen receptor agonist comprising: 

1) determining a structure of a molecular recognition domain of 
said agonist using a three dimensional model of a crystallized 
protein comprising an AR-LBD, and 
25 2) selecting at least one chemical modification of said agonist 

that provides a ligand structure that extends beyond a 
binding site for said agonist and in the direction of at least 
one protein domain important in AR biological function. 
Another preferred method of the invention can be described as a 
30 computational method of designing a selective androgen receptor 

modulator such as an androgen receptor super agonist or antagonist 
comprising: 

1) determining at least one interacting amino acid of an AR-LBD 
that interacts with at least one first chemical moiety of said 
35 ligand using a three dimensional model of a crystallized 

protein comprising AR-LBD with a bound ligand, and 
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2) selecting at least one chemical modification of said first 

chemical moiety to produce a second chemical moiety with a 
structure to reduce or enhance an interaction between said 
interacting amino acid and said second chemical moiety 
5 compared to said interaction between said interacting amino 

acid and said first chemical moiety. 

However, as will be understood by those of skill in the art upon this 
disclosure, other structure based design methods can be used. Various 
10 computational structure based design methods have been disclosed in 
the art. 

For example, a number computer modeling systems are 
available in which the sequence of the AR-LBD and the AR-LBD 
structure (i.e., atomic coordinates of AR-LBD and/ or the atomic 

15 coordinates of the active site, the bond and dihedral angles, and 

distances between atoms in the active site such as provided in Table A) 
can be input. This computer system then generates the structural 
details of the site in which a potential AR modulator binds so that 
complementary structural details of the potential modulators can be 

20 determined. Design in these modeling systems is generally based upon 
the compound being capable of physically and structurally associating 
with AR-LBD. In addition, the compound must be able to assume a 
conformation that allows it to associate with AR-LBD. Some modeling 
systems estimate the potential inhibitory or binding effect of a potential 

25 AR modulator prior to actual synthesis and testing. 

Methods for screening chemical entities or fragments for their 
ability to associate with AR-LBD are also well known. Often these 
methods begin by visual inspection of the active site on the computer 
screen. Selected fragments or chemical entities are then positioned with 

30 the AR-LBD. Docking is accomplished using software such as QUANTA 
and SYBYL, following by energy minimization and molecular dynamics 
with standard molecular mechanic forcefieids such as CHARMM and 
AMBER. Examples of computer programs which assist in the selection 
of chemical fragment or chemical entities useful in the present invention 

35 include, but are not limited to, GRID (Goodford , P.J. J. Med. Chem. 

1985 28:849-857), AUTODOCK (Goodsell, D.S. and Olsen, A.J. Proteins, 
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Structure, Functions, and Genetics 1990 8:195-202), and DOCK (Kunts 
et al. J. Mol. Biol 1982 161:269-288). 

Upon selection of preferred chemical entities or fragments, their 
relationship to each other and AR-ABD can be visualized and the entities 
5 or fragments can be assembled into a single potential modulator. 

Programs useful in assembling the individual chemical entities include, 
but are not limited to CAVEAT (Bartlett et aL Molecular Recognition in 
Chemical and Biological Problems Special Publication, Royal Chem. Soc. 
78, 182-196 (1989) ) and 3D Database systems (Martin, Y.C. J. Med. 

10 Chem. 1992 35:2145-2154). 

Alternatively, compounds may be designed de novo using either 
an empty active site or optionally including some portion of a known 
inhibitor. Methods of this type of design include, but are not limited to 
LUDI (Bohm H-J, J. Comp. Aid. Molec. Design 1992 6:61-78) and 

15 LeapFrog (Tripos Associates, St. Louis. MO). 

The present invention is also directed to an AR-LBD selective 
androgen receptor modulator (SARM), in particular an agonist or 
antagonist or partial agonist or partial antagonist, identified by a 
computational process of the invention. 

20 The present invention is further directed to a method for treating 

prostate cancer comprising administering an effective amount of an AR 
modulator, preferably an antagonist or partial antagonist, identified by a 
computational process of the invention. 

The present invention is also direct to a method for treating an 

25 age related disease comprising administering an effective amount of an 
AR modulator, preferably an agonist or partial agonist, identified by a 
computational process of the invention, preferably wherein said age 
related disease is osteoporosis, muscle wasting or loss of libido. 

Compounds identified as agonists, partial agonists, antagonists, 

30 partial antagonists or SARMs by the methods disclosed herein which are 
active when given orally can be formulated as liquids for example syrups, 
suspensions or emulsions, tablets, capsules and lozenges. A liquid 
composition will generally consist of a suspension or solution of the 
compound in a suitable liquid carrier(s), for example ethanol, glycerin, 

35 sorbitol, non-aqueous solvent such as polyethylene glycol, oils or water, 
with a suspending agent, preservative, surfactant, wetting agent, 
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flavoring or coloring agent. Alternatively, a liquid formulation can be 
prepared from a reconstitutable powder. For example a powder 
containing active compound, suspending agent, sucrose and a sweetener 
can be reconstituted with water to form a suspension; and a syrup can 
5 be prepared from a powder containing active ingredient, sucrose and a 
sweetener. A composition in the form of a tablet can be prepared using 
any suitable pharmaceutical carrier(s) routinely used for preparing solid 
compositions. Examples of such carriers include magnesium stearate, 
starch, lactose, sucrose, microcrystalline cellulose, binders, for example 

10 polyvinylpyrrolidone. The tablet can also be provided with a color film 
coating, or color included as part of the carrier(s). In addition, active 
compound can be formulated in a controlled release dosage form as a 
tablet comprising a hydrophilic or hydrophobic matrix, A composition in 
the form of a capsule can be prepared using routine encapsulation 

1 5 procedures, for example by incorporation of active compound and 

excipients into a hard gelatin capsule. Alternatively, a semi- solid matrix 
of active compound and high molecular weight polyethylene glycol can be 
prepared and filled into a hard gelatin capsule; or a solution of active 
compound in polyethylene glycol or a suspension in edible oil, for 

20 example liquid paraffin or fractionated coconut oil can be prepared and 
filled into a soft gelatin capsule. Compounds identified by the processes 
described herein which are active when given parenterally can be 
formulated for intramuscular or intravenous administration. A typical 
composition for intra-muscular administration will consist of a 

25 suspension or solution of active ingredient in an oil, for example arachis 
oil or sesame oil. A typical composition for intravenous administration 
will consist of a sterile isotonic aqueous solution containing, for example 
active ingredient, dextrose, sodium chloride, a co-solvent, for example 
polyethylene glycol and, optionally, a chelating agent, for example 

30 ethylenediaminetetracetic acid and an anti-oxidant, for example, sodium 
rnetabisulphite. Alternatively, the solution can be freeze dried and then 
reconstituted with a suitable solvent just prior to 
administration. Identified compounds which are active on rectal 
administration can be formulated as suppositories. A typical suppository 

35 formulation will generally consist of active ingredient with a binding 

and /or lubricating agent such as a gelatin or cocoa butter or other low 
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melting vegetable or synthetic wax or fat. Identified compounds which 
are active on topical administration can be formulated as transdermal 
compositions. Such compositions include, for example, a backing, active 
compound reservoir, a control membrane, liner and contact adhesive. 
5 The typical daily dose of a varies according to individual needs, the 
condition to be treated and with the route of administration. Suitable 
doses are in the general range of from 0.001 to 10 mg/kg bodyweight of 
the recipient per day. 

The following examples are to illustrate the invention, but 
10 should not be interpreted as a limitation thereon. 
Examples 

Cloning, Expression and Purification of the Androgen Receptor 
Ligand-Binding Domain 

The rat androgen receptor (rAR) ligand-binding domain (LBD) 
15 cDNA, from amino acid 646 to 901, was cloned from a rat prostate cDNA 
libraiy (Clontech) by PCR. The primers used were 

C ATATGATTG AAGGCTATG AATGTC AAC CTATCTTT (SEQ ID NO:3) and 
TCACTGTGTGTGGAAATAGATGGG (SEQ ID NO:4). The rat AR LBD was 
expressed as a fusion protein driven by the T7 promoter of pET28b 

20 vector (Novagen) to include an N-terminal polyhistidine tag and a 
thrombin cleavage site. The replacement of T877 for A (the LNCaP 
mutation) in this rAR LBD expression construct was performed with the 
QuickChange Site-Directed Mutagenesis kit (STRATAGENE). 
Dihydrotestosterone (DHT) was included in the E. coli (BL21-DE3) 

25 fermentation medium at a concentration of 0.05 mM. Induction with 

0.4 mM isopropyl-p-D-thiogalactopyranoside was allowed to proceed for 
16 hours at 20°C in M9 minimal media supplemented with casamino 
acids (Difco) and trace minerals, and pellets were stored at -70 °C. A 
total of 6-9 mg of recombinant AR LBD was isolated from a 15 gram cell 

30 pellet following sonication and chromatography on a nickel-chelate resin. 
Polyhistidine- tagged AR LBD of approximately 90% purity eluted at 0.45 
M imidazole in a gradient of 0.05-1.0 imidazole. This material was 
quantitatively cleaved at an engineered site for thrombin recognition, 
followed by chromatography on benzamidine sepharose (Pharmacia) to 

35 remove the serine protease, with a 70% recovery. The final sample 

containing the sequence Gly-Ser-His-Met at the N-terminus followed by 
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residues 646-901 of the rat (664 - 919 in the human) AR LBD protein, 
was concentrated for crystallography to 2 mg/ml in 20 mM Tris (pH 7.5), 
0.5 M NaCl, 10% glycerol, 1 mM EDTA and 1 mM DTT. 

The sequence of the rat Androgen Receptor LBD (AR), as cloned, 
5 with the secondary structural features marked. For comparison, the 
aligned sequence of the Progesterone Receptor LBD (PR) is given. 
Residues involved in androgen binding are marked (*). Residues which 
are disordered in the crystal structure are underlined. The AR sequence 
is SEQ ID NO:l. The PR sequence is SEQ ID NO: 2. 

10 

| -HI— | | H3 

660 GSHMIEGYECQPIFLNVLEAIEPGWCAGHDNNQPDSFAALLSSLNELGE AR 
6 78 GQDIQL I P PL INLLMS I E PDVI YAGHDNTKPDT SSS LLTS LNQLGE PR 

* * 

15 

| | H4/5 | 

710 RQLVHWKWAKALPGFRHLHVDDQMAVIQYSWMGLIWFAMGW AR 
724 RQLLSWKWSKSLPGFRNLHIDDQITLIQYSWMSLMVFGLGWRSYKHVSG PR 

20 

SSSS SSS |-H6| | H7 | | H8-- 

76 0 RMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKA AR 
774 QMLYFAPDL ILNEQRMKE S S F YS LCLTMWQ IPQE FVKLQVSQEE FLCMKV PR 

25 -( SSS j H9 | | 

810 LLLFS I IPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQL AR 
824 LLLLNTIPLEGLRSQTQFEEMRSSYIRELIKAIGLRQKGWSSSQRFYQL PR 

— H10/11 | |- | | H12 | 

oU 860 TKLLDSVQPIARELHQFTFDLLIKSH3WSVDFPEMMAEIISVQVPKILSG AR 
8 74 TKLLDNLHDLVKQLHLYCLNTFIQSRALSVEFPEMMSEVIAAQLPKILAG PR 

* 

SSS 

910 KVKPIYFHTQ AR 
35 924 MVKPLLFHK PR 



Crystallization 

The AR-LBD - Dihydrotestosterone (DHT) complex was 
crystallized at 20° C by vapor diffusion in the hanging-drop mode. In the 

40 crystallization trials, the protein complex as obtained from MMB&B was 
used without any further purification. In the initial trial to obtain 
crystallization conditions, a sparse matrix crystallization screen was 
done with the Crystal Screens 1 and 2 (Hampton Research). For each 
crystallization trial, a 2 |ul drop was prepared by mixing 1 jal of purified 

45 protein (1.9 mg ml" 1 ) with an equal volume of reservoir solution. The 
reservoir contained 1.0 ml of the precipitating solution. Small crystals 
were obtained in two days from six of the drops (table 1). 
Table 1: Crystallization Conditions 
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Screen/# Precipitating Solution Result 

1/16 1 .5 M Li Sulfate, 0. 1M Na Hepes, pH 7.5 Small rods 

1 /29 0.8 M Na/K Tartrate, 0. 1M Na Hepes, pH 7.5 Larger rods 
1 /30 2% v/v PEG 400, 2.0 M Am Sulfate, 



5 



0. 1M Na Hepes, pH 7.5 Small cubes 

2/20 1 .6 M Mg Sulfate, 0. 1M MES, pH 6.5 Small crystallites 

2/32 1.6 M Am Sulfate, 0. 1 M Na CI, 



0.1 M Hepes, pH 7.5 

12% v/v Glycerol, 1.5 M Am Sulfate, 

0.1 MTris, pH 8.5 



Small rods 



2/42 



10 



Small rods 



The largest single crystal, measuring 0.05 mm x 0.04 mm x 0.26mm, 
was obtained from Crystal Screen 1, solution # 29 (0.8 M Na/K Tartrate, 
0. 1M Na Hepes, pH 7.5). This crystal was subsequently used in the 
15 initial data collection run (as described below). 



Cyperlab C-200 automated crystallization robotic workstation. A 
crystallization trial was performed using a 24-step linear gradient from 
0.6 M to 1.26 M Na tartrate, 100 Mm Hepes, pH 7.5 (Note: The 

20 optimization screen used sodium rather than sodium/ potassium 

tartrate). The largest, rod shaped crystal, with dimensions 0.09 mm x 
0.09 mm x 0.20mm, was obtained at 0.887 M Na Tartrate. This crystal 
was used in the second data collection run (as described below). 
Data Collection and Reduction 

25 For the initial X-ray experiment, the crystal from the initial 

crystallization screen was flash cooled by dipping it in a cryoprotectant 
solution containing the precipitating solution (0.8 M Na/K Tartrate, 0.1M 
Na Hepes, pH 7.5) with 250mm NaCl and 20% Glycerol added and then 
placed it in a cold stream at 100° K. 

30 For data set 1, X-ray diffraction data were collected with an R- 

Axis II imaging plate detector. The radiation was generated from a 
Rigaku RU-200 rotating at 5 kw power with a fine focus filament (0.3 x 
3.0mm) was monchromated (Cu Ka) and intensified by focusing with 
Yale mirrors (Molecular Structure Corporation). The crystal diffracted to 

35 better than 2.4 A resolution. Autoindexing and processing of the 

measured intensity data was carried out with the HKL software package 
(Otwinoski, L. (1993) in CCP4 Study Weekend, Data Collection and 



Optimization of the crystallization condition was done using a 
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Processing (Sawyer,L., Issacs, N., and Bailey, S., Eds.) pp 56-62, SERC 
Daresbury Laboratory, Warrington, U.K). X-ray diffraction from the 
crystals have the symmetry and systematic absences of the 
orthorhombic space group P212121 with unit cell dimensions a = 56.03 
5 A , b = 66.27 A, c= 70.38 A, and one molecule per asymmetric unit 
(Mathews Volume = 2. 16 A 3 Da- 1 ). 

A second X-ray diffraction data set (data set 2) was collected at 
the IMCA-CAT beamline (sector 17ID) at the Advanced Photon Source 
synchrotron at Argonne, II. The crystal from the optimization screen 

10 described above, was flash-cooled by placing it in the reservoir solution 
(0.877 M Na Tartrate, 0. 1M Na Hepes, pH 7.5) with 250mm NaCl and 
20% Glycerol added, and then placing it in a cold stream at 100° K. The 
data were collected with a Bruker 2x2 mosaic CCD detector. The crystal 
diffracted to better than 2.0 A. Autoindexing and processing of the 

15 measured intensity data was carried out with the HKL2000 software 

package (Otwinoski, L. (1993) in CCP4 Study Weekend, Data Collection 
and Processing (Sawyer,L., Issacs, N., and Bailey, S., Eds.) pp 56-62, 
SERC Daresbury Laboratory, Warrington, U.K.). The data collection and 
processing statistics for both data sets are summarized in table 2. 

20 Structure Determination (Molecular Replacement) 

The structure was determined by the method of molecular 
replacement with the program AmoRe (Navaza, J. (1994) AmoRe: an 
automated package for molecular replacement. Acta Cryst. D50, 157- 
163). The Progesterone Receptor ligand binding domain (PR-LBD), which 

25 has 54% sequence identity and 76% sequence homology to AR-LBD, was 
used as the search model. The atomic coordinates of PR-LBD (Protein 
Data Bank reference code 1A28) by Williams & Sigler (Nature 1998 393, 
391) were unmodified except for the removal of the ligand and solvent 
molecules. A second molecular replacement search was performed with 

30 a theoretical model for the AR-LBD provided by the MMS/CADD group 
(table 3). The PR-LBD structure gave a slightly better solution than the 
AR-model (1.7a vs. 1.3a above background) and was used in the 
subsequent refinement, although both structures gave equivalent results 
with no molecular interpenetration. 
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Table 2: Data Collection 


and Processing 






Data Set 1 


Data Set II 


Date 


5/19/99 


6/17/99 

\JI I 4 / «J\J 


Source/Detector 


Rigaku RU-200 


IMCA/APS 17ID 




R-axis If 


Bruker 2x2 


vvcivCici iy li i 


Cu Ka (1 54 A} 


ioo A 


f- ramose 
n i cii i ivJo 


^64 


400 








orystai to piaie aistance 


i du mm 


i jd mm 


i iiTic/Trarne 


90 min 
£-\J www 


"1 cor* 


Mi imhpr of OhQcar\/atinnc 


C7 , UC 1 


41 R 907 


r^Jo+cj PqHi i/^fion Prnnram 
LJeila rxcUUCAiUI] r FUyiaul 




utri 9H00 
n r\I_ZUUU 


1 Ininiio rofl^ftir^n^ 
wlllLfUc; icllCV/UUlio 


m #94 


18 ^OP 
i o,ouo 


r\di— laivji lo uocu 


10 1 14 

I \J , 1 I *T 


1 R 8R9 


Pocoli iti/™irj 


9 4 A /9 R-9 4 At 


9 0 A (0 1-9 0 At 


f"* « ry^i r\ 1 \ n a o e» 

^ornpieieness 


yo.o /o ^ / 1 .D /o) 


Q9 r o/. /■■y^ n o/\ 


ft/!i iltinlir»it\/ 
IviUtLljJlILrlLy 


R *3 


f -O 


iviosiaciiy 


n ^09 


O ^9 


Rsym (on I) 


4.2 % (17.5%) 


10.1 % (25.6%) 


Space Group 


P212121 


P212121 


a 


56.09 A 


56.08 A 




66.43 A 


65.76 A 


c 


70.54 A 


70.51 A 


Wilson B-value 


39.05 A 2 


29.26 A 2 



25 

Values for data in the last resolution shell are given in parentheses 



Table 3 : Molecular Replacement Statistics 



Search Model: 


Progesterone 


AR Model 




(PDB file 1A28) 




Program Used 


AmoRe 


AMoRe 


Resolution Range 


8.0-4.0 A 


8.0-4.0 A 


Radius of Integration 


25 A 


25 A 


Number of Reflections 


2.393 


2,393 


Number of Atoms 


2,019 


2,094 


RF Correlation (2 nd solution) 


0.16(0.12) 


0.13 (0.11) 


TF Correlation (2 nd solution) 


0.31 (0.20) 


0.23 (0.14) 


TF R-factor (2 nd solution) 


49.0% (52.7%) 


52.1% (54.0%) 


Rigid Body Correlation 


0.34 


0.28 


Rigid Body R-factor 


48.1% 


50.4% 


Structure Refinement 







The structure was first refined with the initial 2.4 A data set (2cr 
data, 9,818 reflections) by the method of simulated annealing with 
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program X-PLOR (Brunger, A.T. ? Kuriyan, J. & Karplus, J. (1987) 
"Crystallographic R- factor refinement by molecular dynamics", Science 
235: 458-460) in four cycles to an R-factor of 27.7%. Each refinement 
cycle consisted of a least-squares minimization, simulated annealing at 
5 3000°, and individual isotropic B-factor refinement. The first cycle, with 
the Progesterone molecular replacement model unmodified for the 
sequence differences between AR and PR, gave an R-factor of 33.8%. The 
model was then rebuilt using the AR amino acid sequence and a second 
refinement cycle gave an R-factor of 29.6%. At this stage of the 

1 0 refinement, the DHT molecule could be clearly seen in the difference 
electron density map. 

After each cycle, the structure was carefully examined using 
molecular computer graphics program Chain (Sack, John S. (1988) 
"CHAIN- A Crystallographic Modeling Program", J. Mol Graphics 6: 224- 

1 5 225) and modifications were made to the structure as needed. Several 
residues, from both the N- and C-termini of the molecule, which were not 
seen in the electron density maps were removed from the model. After 
the second cycle of refinement, the DHT was added to the model. Solvent 
molecules were added where there were 3a peaks in both the 2Fo - Fc 

20 and Fo - Fc electron density maps and removed if their B-factor went 
above 60 A 2 . After four cycles of X-PLOR refinement, a careful 
examination of the electron density showed the model to be much 
improved, although molecular refitting still needed to be done in some 
regions. The density is clear except for some of the loop regions, 

25 particularly the loop between helices I and II, which was also poorly 
modeled in the PR structure. 

Table 4: Refinement Statistics (X-PLOR) 

30 Part I: 2a data (9,81 8 reflections) to 2.4 A 



Cycle 1 


251 residues 


No ligand 


0 waters 


R 


= 33.8 % 


Cycle 2 


248 residues 


No ligand 


0 waters 


R 


= 29.6% 


Cycle 3 


247 residues 


ligand 


1 8 waters 


R 


= 28.3 % 


Cycle 4 


246 residues 


ligand 


40 waters 


R 


= 27.7% 



35 



Part II: 2a data (15,067 reflections) to 2.0 A 
Cycle 5 246 residues ligand 

Cycle 6 246 residues ligand 



32 waters 
57 waters 



R = 27.9 % 
R = 26.8 % 



Cycle 7 
Cycle 8 



246 residues ligand 
246 residues ligand 
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58 waters 
106 waters 



R = 26.7 % 
R = 24.2% 



At this stage of the refinement, the higher resolution data collected at the 
5 APS synchrotron became available. Four additional X-PLOR refinement 
cycles were performed with the 2.0 A data set (2a data, 15,067 
reflections) following the same protocol. The final structure has an R- 
factor of 24.2% with a total of 106 solvent molecules. The final 
refinement statistics are presented in table 5. 

10 Table 5: Final Refinement Parameters 

Resolution Range 1 0.0 - 2.0 A 

Reflections 15,067 

R-factor 24.2 % 

R-free 31.2% 

1 5 # residues 246 (672-917) 

# atoms 21 1 8 (1 991 atoms, 21 DHT, 1 06 waters) 
RMS deviations 

bond lengths 0.014 A 

bind angles 1.594° 

20 Improper angles 1.558° 

Average B-factors 

Protein 25.02 A 2 

DHT 14.40 A 2 

Water 30.21 A 2 

25 Wilson B-factor 29.26 A 2 

Description of the Molecule 

The structure of AR-LBD is complete from residues 67 1 through 
917 for the wild-type and 672 to 918 for the LNCaP mutant. Analysis of 

30 the structures with program PROCHECK showed only minor exceptions 
to the allowed geometry. In the wild-type structure, the first six residues 
of the chain (664 - 670) are not seen in the electron density and are 
probably disordered. This leaves only one residue before the initial 
residue of the first a-helix (HI) in the wild-type structure, none in the 

35 LNCaP mutant structure. On the C- terminal end, the last two residues 
(918-919) are not seen in the electron density of the wild-type 
structure, but only the last is missing in the mutant. In addition, since 
the loop between helices 9 and 10 (residues 845-850) is not well defined, 
it has been modeled as poly- alanine. 
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Folding and Packing 

As expected, the AR LBD has the same overall three- 
dimensional structure as those of the other nuclear hormone receptor 
5 LBDs. The molecule is folded into a "helical sandwich" consisting of 10 
a-helices. There are four small pieces of beta strand, forming two short 
beta-sheets; one in the core of the molecule between helices 5 and 6 near 
the ligand binding site, and the other formed by the loop between helices 
8 and 9 and the C-terminus. This latter sheet, also seen in the PR LBD 
10 structure, holds helix 12 in the closed, agonist conformation, close to 
and capping the ligand binding site. 
Lack of Dimer Formation 

Studies have indicated that the estrogen, progesterone, and 
androgen receptors all function as homodimers and that AR LBD forms 
15 dimers in solution. Thus it could be expected that the AR LBD domains 
might form homodimers in the crystal similar to those previously seen in 
the RXR-a and estrogen receptor (ER) LBD crystal structures. In the PR 

LBD structure, the two monomers in the asymmetric unit are related by 
a dyad, but the two-fold-symmetric configuration is strikingly different 

20 from that of the RXR and ER homodimers and the area buried in this 
configuration is much smaller than would be expected for stable dimer 
formation. In the AR LBD crystal, the ligand-binding domains are 
unmistakably monomeric, and there are no twofold axes relating 
domains. Moreover, the homodimer interaction seen in the structures of 

25 ER and RXR LBDs is not possible for the AR LBD, as the C-terminal tail 
is bound to the groove formed by helices 9 and 10, thereby obstructing 
the contact region between monomers in RXR and ER homodimers. 
Whether this observation reflects a non-dimeric state of the AR LBD in 
the functional AR dimer or is an artifact of the conditions used for AR 

30 LBD crystallization remains to be determined. It is noteworthy that the 
ER LBD constructs used for crystallization have been truncated to 
remove an analogous C-terminal extension. 
Comparison with Progesterone Receptor 

While there is only 55% sequence identity between AR LBD and 

35 PR LBD, there is a 77% sequence similarity, and as expected, the three- 
dimensional structures of these two LBDs are very similar with an r.m.s. 
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deviation of 1.3 A between corresponding Ca atom positions. As with PR, 
AR LBD has no helix 2, but its helix 12 is longer than those of RXR or 
TR. In the case of AR, while helices 10 and 1 1 are nearly contiguous, 
there is a proline residue at position 868 that causes a kink between the 
5 two helices . 

Comparison with theoretical AR model 

The theoretical AR model obtained from MMS/CADD and the AR 
structure have an r.m.s. deviation of 1.29 A for the 247 alpha carbons. 
More importantly, the hormone binding site is virtually identical with the 
10 exception of the side chains of Met 732(749), Leu 863(880), and Leu 

864 (881) which are in different rotomers. This causes the binding cavity 
to be more compact in the AR structure. Also, there is a flip of the side 
chain of Asn 688(705) so that the ND2 atom is in position to make a 
hydrogen bond to the carbonyl off of the D-ring, 

15 Table 6: Comparison of AR-LBD to PR-LBD and Theoretical model 

Calpha Main Side Total 

AR vs. Pr 1 .22 (246) 1 .27 (983) 1 .80 (772) 1 .53 (1 ,755) 

AR vs. CADD 1.25(246) 1.31 (983) 2.41(971) 1.93(1,954) 

20 Binding of Dihydrotestosterone 

At the end of the molecular replacement procedure with the PR 
LBD structure without progesterone as search model, the largest piece of 
difference electron density, at approximately the 3c level, was found at 
the progesterone-binding site. Replacing the bound progesterone agonist 

25 (which has a carboxyl group at the 17-position) with a model of d- 

hydrotestosterone (DHT, which has a hydroxyl group at the 17-position) 
produced an even better fit to the difference electron density, indicating 
that DHT binds to AR LBD in an almost identical fashion to the way 
progesterone binds to PR LBD. Both agonists interact with helices 3, 5, 

30 and 1 1 of their respective LBDs. Ring A, which is identical in the two 

steroids, makes similar interactions with the side chains of Q71 1, M745, 
R752 (Q725, M759, R766 in PR LBD), and a conserved water molecule. 
The interactions with ring C are also similar, with close contacts to the 
mainchain of L704 (L718 in PR LBD) and sidechain of N705 (N719 in PR 

35 LBD). The contact between C18 and the Oyl of T877 is unique to the 
wild- type AR LBD, as the corresponding cysteinyl side chain is pointed 
away from the steroid in the PR LBD structure. 
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Since progesterone and DHT differ in the substituent on ring D, 
it is expected that interactions with respective receptors will differ in this 
region. In the AR LBD structure, N52 of N705 makes a hydrogen bond to 
the D-ring hydroxyl of DHT. A similar interaction could be made 
5 between progesterone and the PR LBD if there were a flip of both the 
steroid acetyl group and the side chain of N719. This would place the 
oxygen approximately 3.2 A from the N82 atom of Asn 719. The ligand 
contact surface area is slightly larger for progesterone in PR than for 
DHT in AR (483 vs. 448 A 2 ) but they are both considerably smaller than 
10 the ligand contact surface area in TR (559 A2), PPARy (583 A^ ), or the 

Vitamin D receptor (677 A 2 ). 

Figure 3 shows two orthogonal views of the omit electron density 

map of dihydrotestosterone (DHT) in the hormone-binding site of AR- 

LBD. There are hydrogen bonds between the steroid and the side chains 

1 5 of Arg 752 and Asn 705. 

Table 7: Dihydrotestosterone Contacts (3.4 A) 

Hydrogen Bonds 

20 03 Arg 752 Nh2 2.89 A (2.77 A) 

OS Gin 71 1 Ns2 3.36 A (3.20 A) 



25 



35 



O20 Asn 705 N52 2.80 A (3.20 A) 

O20 Thr877 0y1 270 A (N/A) 



Possible Close Contacts 

C11 Leu 704 O 3.31 A 

30 C12 Asn705 N52 3.07 A 

C17 Asn705N52 3.34 A 

C19 Met 745 S5 3.38 A 

C18 Thr877 0y1 3.07 A 



Comparison with Progesterone binding 

Comparison of the structure of DHT in the AR-LBD with the 
40 structure of progesterone in the PR-LBD (Williams, S.P. & Sigler, P.B, 

(1998) "Atomic Structure of Progesterone Complexed with its Receptor" , 
Nature 393, 391) shows a similar mode of binding. Ring A, which is 
identical in the two steroids, makes similar interactions with the side 
chains of Q71 1, M745 ? R752, Q71 1 and a conserved water molecule 
45 (table 8). The interaction with ring C are also similar, with close 
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10 



15 



20 



contacts to the mainchain of L704 and sidechain of N705. The contact 
from C18 to the Oyl of T877 is unique to AR-LBD, as the corresponding 
cysteine sidechain is pointed away from the steroid in the PR-LBD 
structure 

Since progesterone and DHT differ in the substitution off of ring 
D, it is expected that there will be different interactions with the protein 
in this region. In the AR structure, the N82 atom of Asn 705 makes 
hydrogen bond to the D-ring hydroxyl. 

A similar interaction could be made in the PR if there were a flip 
of both the steroid carboxyl group and the side chain of N719. This 
would place the carboxyl oxygen approximately 3.2 A from the N52 atom 
of Asn 719. In AR-LBD, there is also a close contact to the side chain of 
T877 which is absent in the PR-LBD structure. 

Figure 4 shows comparison of AR and PR steroid binding 
Comparison of the binding of dihydro testosterone to AR-LBD (top) and of 
progesterone to PR-LBD (bottom). Note that an additional hydrogen bond 
interaction would be possible if both the sidechains of both N719 and the 
progesterone were flipped. 

Table 8: Comparison of AR and PR steroid binding 



AR 



PR 



25 



30 



Ring A 
03: 



C19 



H-bond to R752 NH2 (2.9 A) 

H-bond to water (3.5 A) 

SC of Q71 1 in different rotomer 
distance to 03 is 3.4 and 4.13 A 

Contact to M745 SD (3.4 A) 



H-bond to R766 NH2 (2.8 A) 

H-bond to water (3.1 / 3.4 A) 

Contact to SC of Gin 725 
distance to 03 is 3.2 and 3.3 A 

Similar orientation (3.5 A) 



35 



C2: 



SC ofQ711 (3.5 A) 



different rotomer (3.2 & 3.3) 
distance to C4 is 4.1 A 



40 



Ring C 
C11 
C12 
C18 



LO704 O (3.3A) 

Contact to N705 N82 (3.1 A) 

Contact T877 Oy1 (3.1 A) 



(3.5A) 

Contact to N719 051 (3.4 A) 

SC of C891 pointing away 
distance to Sy is 3.8 A 



45 



Ring D 
O20/C21 



021 in AR is close to C21 in PR (Possible flip of Carboxyl in PR?) 
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N/A 



020: Contact to C891 Ca (3.2 A) 



5 



O20: H-bond N705 N62 (2.8A) 
O20: Contact T877 Cy1 (2.7 A) 



C21: Contact to N719 OD1 (3.2 A) 
SC of C891 pointing away 



C17 



Contact N705 N52 (3.3 A) 



Ring in slightly different orientation; 
distance to N719 081 is 4.7 A 



10 

Structure of the Complex of DHT with the LBD of the LNCaP Mutant 

In the LNCaP mutant, T877 is replaced by an alanine residue. 
The mutant LBD structure has an r.m.s. deviation of 0.8 A compared to 
the wild- type structure, close to the expected r.m.s. deviation due to the 

15 estimated errors in the coordinates. In particular, the binding of DHT is 
essentially identical by wild-type and mutant LBDs except at the point of 
mutation. Here the replacement of T877 by alanine leaves additional 
space off the D-ring of DHT to accommodate a larger substituent on 
position 17. This may explain the promiscuous ability of the LNCaP 

20 mutant, unlike wild-type AR, to bind to a variety of other hormones and 
analogs like some progestins, estrogens and Cortisols that differ from 
DHT in substitution at position 17. For example, the binding of 
flutamide, estradiol, and progesterone to the LNCaP mutant can activate 
the mutant receptor. Conversely, mutation of T877 to residues with 

25 larger sidechains such as aspartic acid and lysine would be expected 
completely preclude the binding of ligands with any substituent at 
position 17 of the D-ring and such mutations have been shown to totally 
eliminate androgen binding. 
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We claim: 

1 . A crystal of an AR-LBD comprising: 

a) an AR-LBD and an AR-LBD ligand or 

b) an AR-LBD without an AR-LBD ligand; 

5 wherein said crystal diffracts to at least 3 angstrom resolution and has a 
crystal stability within 5% of its unit cell dimensions. 

2. The crystal of claim 1 wherein said AR-LBD has at least 200 amino 
acids. 

3 . The crystal of claim 1 , wherein said AR-LBD is the AR amino acid 
1 0 sequence 672 to 9 17 of rat AR 

4. The crystal of claim 1, wherein said AR-LBD is the AR amino acid 
sequence 672 to 917 of human AR. 

5. The crystal of claim 1 wherein the ciystal comprises an AR-LBD and 
an AR-LBD ligand and the AR-LBD ligand is an agonist or 

1 5 antagonist, a partial agonist or partial antagonist, or a SARMs of the 

AR-LBD. 

6. The crystal of claim 5 wherein the agonist is dihydro testosterone. 

7. The crystal of claim 1 having all of the coordinates listed in Table A. 

8. The crystal of claim 1 wherein said crystal comprises mammalian 
20 AR-LBD protein. 

9. The crystal of claim 1 wherein said crystal comprises rat AR-LBD 
protein. 

10. The crystal of claim 1 wherein said AR-LBD ligand has the 
following unit cell dimensions in angstroms: a = 56.03 ± 5% , b 

25 = 66.27 ± 5%, c = 70.38 ± 5% and an orthorhombic space group 

P212121. 

11. A molecule or molecular complex comprising all or any part of 
the ligand binding site defined by structure coordinates of AR-LBD 
amino acids V685, L700, L701, S702, S703, L704, N705, E706, 

30 L707, G708, E709, Q711, A735, 1737, Q738, Y739, S740, W741, 

M742, G743, L744, M745, V746, F747, A748, M749, G750, R752, 
Y763 ? F764, A765, L768, F770, M780, M787, 1869, L873, H874, 
F876, T877 and F878 according to Table A, or a mutant or 
homologue of said molecule or molecular complex. 

35 
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12. The molecule or molecular complex of claim 1 1 wherein said 
mutant or homologue comprises a binding pocket that has a root 
mean square deviation from the backbone atoms of said AR-LBD 
amino acids of not more than 1.5 Angstroms or 30% sequence 

5 identity with said AR-LBD amino acids. 

13. A molecule or molecular complex comprising all or any part of 
the ligand binding site defined by structure coordinates of AR-LBD 
amino acids N705, Q711, R752, F764 and T877 according to Table A, 
or a mutant or homologue of said molecule or molecular complex. 

10 14. The molecule or molecular complex of claim 13 wherein said 
mutant or homologue comprises a binding pocket that has a root 
mean square deviation from the backbone atoms of said AR-LBD 
amino acids of not more than 1.5 Angstroms or 30% sequence 
identity with said AR-LBD amino acids. 

15 15. A machine-readable data storage medium comprising a data 
storage material encoded with machine readable data, wherein the 
data is defined by the structure coordinates of an AR-LBD / AR-LBD 
ligand or ligand complex according to Table A or a homologue of said 
complex, wherein said homologue comprises backbone atoms that 

20 have a root mean square deviation from the backbone atoms of the 

complex of not more than 3.0 A 

16. The machine-readable data storage medium according to claim 
15, wherein said AR-LBD /AR-LBD ligand or ligand complex is 
homologue having a root mean square deviation from the backbone 

25 atoms of said amino acids of not more than 2.0 A. 

17. A machine-readable data storage medium comprising a data 
storage material encoded with a first set of machine readable data 
comprising a Fourier transform of at least a portion of the structural 
coordinates for an AR-LBD/ AR-LBD ligand according to Table A; 

30 which, when combined with a second set of machine readable data 

comprising an X-ray diffraction pattern of a molecule or molecular 
complex of unknown structure, using a machine programmed with 
instructions for using said first set of data and said second set of 
data, can determine at least a portion of the structure coordinates 

35 corresponding to the second set of machine readable data, said first 

set of data and said second set of data. 
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18. A binding site in AR-LBD for an AR modulator in which a 
portion of said ligand is in van der Walls contact or hydrogen 
bonding contact with any portion or all of residues V685, L700, 

5 L7Q1, S702, S703, L704, N705, E706, L707, G708, E709, Q711, 

A735, 1737, Q738, Y739, S74G, W741, M742, G743, L744, M745, 
V746, F747, A748, M749, G750, R752, Y763, F764 ? A765, L768, 
F770, M780, M787, 1869, L873, H874, F876, T877, F878, L880, 
L881, V889, F891, P892, E893, M894, M895, A896, E897, 1898, 
10 1899, S900, V901, Q902, V903 ? P904 or 1906 of AR-LBD according to 

Table A. 

19. The binding site according to claim 18 wherein the AR-LBD is a 
homologue or mutant with 25%-95% identity to residues V685, L700, 
L701, S702, S703, L704, N705, E706, L707, G708, E709, Q711, 

15 A735, I737 ? Q738, Y739, S740, W741, M742, G743, L744, M745, 

V746, F747, A748, M749, G750, R752, Y763, F764, A765, L768, 
F770, M780, M787, 1869, L873, H874, F876, T877, F878, L880, 
L881, V889, F891, P892, E893, M894, M895, A896, E897 ? 1898, 
1899, S900, V901, Q902, V903, P904 or 1906 of AR-LBD according to 

20 Table A. 

20. A method of obtaining structural information about a molecule 
or a molecular complex of unknown structure by using the structure 
coordinates set forth in Table A, comprising the steps of: 

a. generating X-ray diffraction data from said crystallized molecule 
25 or molecular complex; 

b. applying at least a portion of the structure coordinates set forth 
in Table A to said X-ray diffraction pattern to generate a three- 
dimensional electron density map of at least a portion of the 
molecule or molecular complex; and 

30 c. using all or a portion of the structure coordinates set forth in 

Table A to generate homology models of AR-LBD or any other 
nuclear hormone receptor ligand binding domain. 

21. A computational method of designing an androgen receptor 
synthetic ligand comprising: 

35 a. using a three dimensional model of a crystallized protein 

comprising an AR-LBD /AR-LBD ligand complex to determine 
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at least one interacting amino acid of the AR-LBD that 
interacts with at least one first chemical moiety of the AR- 
LBD ligand; and 
b. selecting at least one chemical modification of said first 
5 chemical moiety to produce a second chemical moiety with a 

structure that either decreases or increases an interaction 
between said interacting amino acid and said second 
chemical moiety compared to said interaction between said 
interacting amino acid and said first chemical moiety. 
10 22. A method for identifying a compound that modulates androgen 
receptor activity, the method comprising any combination of steps of: 

a. modeling test compounds that fit spatially into the AR-LBD as 
defined by structure coordinates according to Table A, or using 
a three-dimensional structural model of AR-LBD, mutant AR- 

1 5 LBD or AR-LBD homologue or portion thereof; 

b. using said structure coordinates or ligand binding site as set 
forth in claim 18 to identify structural and chemical features; 

c. employing identified structural or chemical features to design or 
select compounds as potential AR modulators; 

20 d. employing the three-dimensional structural model or the ligand 

binding site to design or select compounds as potential AR 
modulators; 

e. synthesizing the potential AR modulators; 

f. screening the potential AR modulators in an assay 

25 characterized by binding of a test compound to the AR-LBD; 

and 

g. modifying or replacing one or more amino acids from AR-LBD 
selected from the group consisting of V685, L700, L701, S702, 
S703, L704, N705, E706, L707, G708, E709, Q711, A735, 1737, 

30 Q738, Y739, S740, W741, M742, G743, L744, M745, V746, 

F747, A748, M749 7 G750, R752, Y763, F764 ? A765 ? L768, 
F770, M780 ? M787, 1869, L873, H874, F876, T877 ? F878, 
L880, L881, V889, F891, P892, E893, M894, M895, A896, 
E897 5 1898, 1899, S900, V901, Q902, V903, P904 or 1906 of 

35 AR-LBD according to Table A. 
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23. The method according to claim 22 wherein the potential AR 
modulator is from a library of compounds. 

24. The method according to claim 22 wherein the potential AR 
modulator is selected from a database. 

5 25. The method according to claim 22 wherein the potential AR 
modulator is designed de novo. 
26. The method according to claim 22 wherein the potential AR 
modulator is designed from a known agonist, partial agonist, 
antagonist, partial antagonist or SARMs. 
10 27. The method according to claim 22 wherein the potential AR 
modulator is an agonist or partial agonist and AR activity is 
measured by translocation or unwinding or helix 12. 

28. The method according to claim 22 wherein the potential AR 
modulator is an antagonist or partial antagonist and AR activity is 

15 measured by translocation or unwinding or helix 12. 

29. An AR modulator identified by the method of claim 22. 

30. A method for treating prostate cancer comprising administering 
an effective amount of an AR modulator identified by the method of 
claim 22. 

20 31. A method for treating an age related disease comprising 

administering an effective amount of an AR modulator identified by 
the method of claim 22. 
32. The method of claim 31 wherein said age related disease is 
osteoporosis, muscle wasting or loss of libido. 

25 
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ABSTRACT 

The first crystal structure of the androgen receptor ligand 
binding domain has been determined to 2.0 angstrom resolution. 
Disclosed are the coordinates for the crystal structure, and methods for 
determining agonists, partial agonists, antagonists, partial antagonists 
and selective androgen receptors modulators (SARMs) of the androgen 
receptor. 
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<110> Weinmann, Roberto 

Einspahr, Howard M. 
Krystek, Jr., Stanley R. 
Sack, John A. 
Salvati, Mark E. 
Tokarski , John S . 
Attar, Ricardo M. 
Wang, Chihuei 

<12 0> Crystallographic Structure of the Androgen Receptor 
Ligand Binding Domain 

<130> BMS-0010 

<140> 
<141> 

<150> 60/159,394 
<151> 1999-10-14 

<160> 4 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 260 
<212> PRT 
<213> Rattus sp. 

<400> 1 

Gly Ser His Met lie Glu Gly Tyr Glu Cys Gin Pro He Phe Leu Asn 
! 5 10 15 

Val Leu Glu Ala He Glu Pro Gly Val Val Cys Ala Gly His Asp Asn 
20 25 30 

Asn Gin Pro Asp Ser Phe Ala Ala Leu Leu Ser Ser Leu Asn Glu Leu 
35 40 45 

Gly Glu Arg Gin Leu Val His Val Val Lys Trp Ala Lys Ala Leu Pro 
50 55 60 

Gly Phe Arg Asn Leu His Val Asp Asp Gin Met Ala Val He Gin Tyr 
65 70 75 80 

Ser Trp Met Gly Leu Met Val Phe Ala Met Gly Trp Arg Ser Phe Thr 



1 



85 



90 



95 



Asn Val Asn Ser 
100 

Glu Tyr Arg Met 
115 

Arg His Leu Ser 
130 

Phe Leu Cys Met 
145 

Gly Leu Lys Asn 



Lys Glu Leu Asp 
180 

Cys Ser Arg Arg 
195 

Pro lie Ala Arg 
210 

Ser His Met Val 
225 

Ser Val Gin Val 



Arg Met Leu Tyr 



His Lys Ser Arg 
120 

Gin Glu Phe Gly 
135 

Lys Ala Leu Leu 
150 

Gin Lys Phe Phe 
165 

Arg lie lie Ala 



Phe Tyr Gin Leu 
200 

Glu Leu His Gin 
215 

Ser Val Asp Phe 
230 

Pro Lys lie Leu 
245 



Phe Ala Pro Asp 
105 

Met Tyr Ser Gin 



Trp Leu Gin lie 
140 

Leu Phe Ser lie 
155 

Asp Glu Leu Arg 
170 

Cys Lys Arg Lys 
185 

Thr Lys Leu Leu 



Phe Thr Phe Asp 
220 

Pro Glu Met Met 
235 

Ser Gly Lys Val 
250 



Leu Val Phe Asn 
110 

Cys Val Arg Met 
125 

Thr Pro Gin Glu 



lie Pro Val Asp 
160 

Met Asn Tyr lie 
175 

Asn Pro Thr Ser 
190 

Asp Ser Val Gin 
205 

Leu Leu lie Lys 



Ala Glu He He 
240 

Lys Pro He Tyr 
255 



Phe His Thr Gin 
260 



<210> 2 
<211> 255 
<212> PRT 
<213> Rattus sp. 

<400> 2 

Gly Gin Asp He Gin Leu He Pro Pro Leu He Asn Leu Leu Met Ser 
15 10 15 

He Glu Pro Asp Val He Tyr Ala Gly His Asp Asn Thr Lys Pro Asp 
20 25 30 



2 



Thr Ser Ser Ser Leu Leu Thr Ser Leu Asn Gin Leu Gly Glu Arg Gin 
35 40 45 



Leu Leu Ser Val Val Lys Trp Ser Lys Ser Leu Pro Gly Phe Arg Asn 
50 55 60 

Leu His He Asp Asp Gin He Thr Leu He Gin Tyr Ser Trp Met Ser 
65 70 75 80 

Leu Met Val Phe Gly Leu Gly Trp Arg Ser Tyr Lys His Val Ser Gly 
85 90 95 

Gin Met Leu Tyr Phe Ala Pro Asp Leu He Leu Asn Glu Gin Arg Met 
100 105 HO 

Lys Glu Ser Ser Phe Tyr Ser Leu Cys Leu Thr Met Trp Gin He Pro 
115 120 125 

Gin Glu Phe Val Lys Leu Gin Val Ser Gin Glu Glu Phe Leu Cys Met 
130 135 140 

Lys Val Leu Leu Leu Leu Asn Thr He Pro Leu Glu Gly Leu Arg Ser 
145 150 155 160 

Gin Thr Gin Phe Glu Glu Met Arg Ser Ser Tyr He Arg Glu Leu He 
165 170 175 

Lys Ala He Gly Leu Arg Gin Lys Gly Val Val Ser Ser Ser Gin Arg 
180 185 190 

Phe Tyr Gin Leu Thr Lys Leu Leu Asp Asn Leu His Asp Leu Val Lys 
195 200 205 

Gin Leu His Leu Tyr Cys Leu Asn Thr Phe He Gin Ser Arg Ala Leu 
210 215 220 

Ser Val Glu Phe Pro Glu Met Met Ser Glu Val He Ala Ala Gin Leu 
225 230 235 240 

Pro Lys He Leu Ala Gly Met Val Lys Pro Leu Leu Phe His Lys 
245 250 255 



<210> 3 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Synthetic 
<400> 3 

catatgattg aaggctatga atgtcaacct atcttt 



<210> 4 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
<400> 4 

tcactgtgtg tggaaataga tggg 



4 



